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1 . Introduction 

The purpose of this report is to review developments in 
objective analysis of meteorological fields. The first part 
will review the process known as "optimum interpolation" and 
observe that if coincides with schemes developed independently 
in other scientific disciplines. The term "optimum" arises 
from the fact that the expected mean squared error over some 
ensemble of realizations (e.g., over time) is minimized. The 
term "interpolation" is misleading since it refers to inferring 
values at points other than data points, but not, however, 
by a scheme that necessarily reproduces given values at the 
data points. Because of this fact, it would probably be 
better to call the process "optimum approximation". However, 
we follow the meteorological literature and retain the term 
"optimum interpolation". 

The second basic thrust of this report is to discuss 
Cressman's scheme of successive approximations and show that 
a certain variant of the scheme will converge to the same 
result given by optimum interpolation. Use of this process 
could be advantageous from a computational viewpoint, compared 
to optimum interpolation. 

In Section 2 we derive the optimum interpolation scheme 
and show the functional form of the approximation. We address 
some computational aspects and recent developments in Section 
3. In Section 4 Cressman's successive correction scheme is 
discussed, including a statistically motivated variant of it. 
The final section is devoted to showing that a suitable 
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variant of a successive corrections scheme will converge to 
the same function as given by optimum interpolation. 

2 . The Functional Form of Optimum Interpolation 

The development of optimum interpolation in meteorology 
dates back to Gandin [12] and is based on Wiener-Kolmogorov 
theory in time series analysis. Various disciplines have used 
similar schemes for some time, apparently developed indepen- 
dently. We have discovered references to developments in 
geology /mining (where it is usually called kriging) [1] , [20] , 

[22], [23], [27], photo gramme try [19], geodesy [15], statistics 

(where it is called stochastic process prediction) [32]/ and 
electrical engineering (where it is sometimes called a Wiener 
filter) [8 ] . 

We derive the general form of optimum interpolation and 
show the form of the interpolation function. While the latter 
is known, it is not well known and is even disavowed in print 
in one paper [reply to 2] . This situation has probably occurred 
because the principal interest is to obtain a grid of points 
from scattered observations and not to obtain an approximating 
surface. However, the form of the equation of the surface is 
interesting and revealing. 

Let X be the independent variable, and Z (X) be a random 
function whose value is to be estimated from known or measured 
values Z(X 1 ), Z(X 2 ), ..., Z ( 2^j) at scattered points, X]_ , •••/ 

X^. We denote the expected value of Z (X) , E[Z(X)] by m(X) . 

This mean value as a function of position is called the trend 
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surface, and depending on the source of the data, may be 
assumed to have a particular form, or to be zero. We will 
assume that the trend surface is given by 

n 

m(X) = l c f (X) , (n < N-l) 

k=0 K 

where the f^.(X) are known linearly independent functions 
(with unknown coefficients, c^) . For meteorological applica- 
tions, Z represents a residual (deviation from climatology, 
say) which is assumed to have zero mean, and thus m(X) = 0. 

We include the term for completeness in our development. 

A number of assumptions will be made concerning the dis- 
tribution of the random function Z(X). We want to estimate 
Z(X) by a linear predictor, 

N 

z ( X) = l X. Z (X . ) . 

j=l J ~ J 

We assume that the optimum predictor is linear in the observed 
values, which is the case if the distribution is Gaussian, but 
not necessarily otherwise. In principle the following process 
can formally be carried out without assumptions about the 
covariance function 

(1) C ( X, Y) = E [ ( Z ( X) - m(X) ) (Z (Y) -m(Y))] . 

In practice, and for computational reasons it is convenient 
to make the assumptions of stationarity and isotropy for the 
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covariance function. The net effect of these assumptions is 
that the covariance function C(X,Y) is a function of the distance 
between X and Y only, not X,Y, or their relative positions 
other than distance between them. These assumptions probably 
do not hold in meteorological applications; for example, 
prevailing winds will certainly tend to give a distortion from 
isotropy and various landforms will give a distortion from 
stat ionarity . 

We want to estimate the value of Z at X; let us call that 
estimate Z (X) . We will do this by minimizing E[(Z(X) -Z(X))^] 
where 



N 

Z (X) = l A . Z (X . ) , 

j=l D 11 



subject to some conditions which guarantee unbiased estimates. 

For example if Z (X) has an unknown constant mean, E(Z(X)) = c , 

N 

then the constraint £ A . = 1 is needed to guarantee the estimate 

j=l J 

is unbiased. In the general case, the constraints to be im- 
posed are 



( 2 ) 



N 



j = l 



Ajf^Xj) = f k (X) , k = 0,1, 



, n . 



Note that this implies we must have n < N and further it can 
be deduced that if the data lies on the trend surface, so will 
the estimated point (provided the estimate is unique, a standard 
assumption ) . 
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In meteorological applications it is assumed the measure- 



ments ("known" values, Z (X^) ) are subject to errors, hence the 
measured values are Z(X^) + e(X^) . We assume the errors are 
Gaussian with mean zero , and are independent of the function 
Z (X) . We denote the covariance function for the errors by 
C £ (X, Y) . Ultimately we assume the errors are independent, 
so that we will have C (X^X-;) = 0 ^ 6(X-X.), where 6(0) = 1, 

£ -L £ ^ ' 1 

6 (X) =0, X ^ 0. For the derivation, however, we will allow 

the more general covariance function. We note that in the case 

of satellite data, for example, the assumption of independence 

and zero mean will probably not be satisfied. 

~ 2 

To minimize E[(Z(X) - Z (X) ) ] subject to the constraints 

(2) , we use Lagrange multipliers, 2y^, obtaining the objective 
function 



N n N 

(3) E [ ( Z (X) - l A . (Z (X . ) + e (X . ) ) } ] + l 2y ( £ A f (X ) -f k (X)) 

- ji ]. 3 ~3 ~3 k=0 k j=l 3 X -j k - 



Before differentiation, we write this as 



(4) E[(Z(X) -m(X)) - 2(Z(X) - m(X) ) £ A . ( Z (X . ) + e ( X . ) - m (X ) ) 

j=l 33 J J 



+ ( l A . ( Z (X . ) + e (X . ) - m (X . ) ) ) 2 ] 
j=l 3 -3 ~3 



n N 

+ l 2y ( l A f (X.) - f k (X) ) • 
k =0 k j=l 3 k 3 k 
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Upon taking partial derivatives with respect to the A^ and 
y^/ and simplifying somewhat, we obtain 



( 5 ) 



l IC(X i ,X.) 

j J J 




i = 1,2 



,N, 



y k f k ( -i ) 



C(X,X i ) 



and the constraint equations (2) . 

In matrix form the system of (linear) equations to be 
solved is conveniently represented in partitioned form as 




where 

M = (C(X, ,X.) + C (X. ,X.)) i, j = 1,...,N 

1 J £ 1 J 

F = ( f k^-i^ k = °'***' n i = !/ — / N 

0 is a zero matrix of order (n+1) x (n+1) 

y — (y 0 , . . . ,y R ) 

Vf, = (C(X,X i ), ..., C(X,X N )) t , and 

If = (f 0 (x)/ •••/ f n (x)) t . 
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Letting G denote the coefficient matrix, we have that 
(formally) the solution is 




Letting d = (Z (X^) + e (X^) , . . . , Z (X^ + e (X^j) , 0 , . . . , 0) represent 
the data vector, we obtain 



N 

(7) Z (X) = l X. (Z(X.) + e(X.) 

j=l J J 3 




We will give an alternative interpretation of Z. Note that 

^ depends on the value of X as well as the data points 

X-^,...,X N / while d is independent of X. Since G is symmetric, 
so is G 1, and we have 




Z(X) 





Now, G ^d is the solution of a certain system of equations, 
namely 



(8) Ga = d, 

where a = (A 1 ...A N This represents Z(X) in the 
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form 



(9) Z (X) = (^G -1 ) 

N 

= l A.C ( 
i=l 1 

The system of equations (8) can be thought of as arising from 
the requirements that an approximation consisting of a linear 
combination of the functions C(X,X^) + C £ (X,Xj_)' i = 
and f k (X) , k = 0,...,n be required to interpolate the data, 
Z(X.) + £ ( X ^ ) , i = 1,...,N, along with constraints analogous 
to exactness for the f^(X), 

N 

l A.f. (X.) = 0, k = 0,1 , . . . ,n . 

j=l 3 K 3 

N 

Of course the terms £ A.C (X,X.) represent interpolation to 

i=l 1 e 1 

the error function and are then dropped to obtain (9) . View- 
ing things from this perspective, the computation of Z at a 
number of different points is simplified, provided the error 
estimate given by optimum interpolation is not to be computed. 
We address this briefly in the next section. 

The point of view afforded by (9) makes it apparent that 
"regionalizing" the process by choosing (from a larger set) 
data points near the X of interest must lead to a discontinuous 
surface which may, in turn, lead to unnecessary and unwanted 
disturbances. Phillips [31 ] addresses this problem when dis- 
cussing combined analysis and initialization (or perhaps. 



m - -a 



n 



K,X ) + l b f (X) 
1 k=0 K K 
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better to say, analysis which does not require initializa- 
tion) . See also Williamson and Daley for an iterative approach 
to overcoming this problem. 

In meteorological applications the error covariances are 

assumed independent [see, e.g., 7], in which case C £ (X,}^) = 

2 2 

o 6 (X - X. ) , where a is the variance of the error at X.. 

e i ~ _1 e i - 1 

In this instance the matrix M differs from the matrix (C(X,Xj)) 

only in that the diagonal terms are augmented. This has a 

beneficial effect in terms of the condition number of G and 

hence the numerical process of solving (6) or (8) . 

The equations we have derived are for a single dependent 

variable Z (X) . In both meteorology and geology simultaneous 

treatment of related dependent variables has been derived and 

is used. If all dependent variables are measured at each data 

point, X^, and if the sum of the expected squared errors for 

the dependent estimates is to be minimized, the final result 

is formally the same as equation (6) . However, each entry in 

M, and each of Z(X.) and A. must be vectors, and the entries 

in the matrix F are replaced by block diagonal matrices. See 

Myers [24] for details, where the process is called co-kriging. 

In meteorology not all variables are measured at each data 

point. The complication this causes is readily resolved, 

although it is simpler to group variables instead of points. 

It is called multivariate optimum interpolation [7], a confusing 

term to those outside the field since the "multivariate" 

refers to the dependent variables, not the independent variables. 

Of course, cross covariances between the dependent variables 
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are required. See [7] and [21] for the development in 
meteorological terms. 



As a matter of interest, we observe that the process is 
somewhat reminiscent of Lagrange interpolation, with the X_. 
playing the part of the fundamental Lagrange polynomials. 



the values of the fundamental Lagrange polynomials at the 



to solving for coefficients in the interpolation polynomial 
expressed as a linear combination of polynomials. 

3. Practical Considerations and Recent Results 

One of the most important aspects of optimum interpolation 
is the appropriate specification of the covariance function, 
C(X,Y). In meteorology this has been treated by a number of 
authors, [3], [9], [33]. The importance of this has been 

recently noted by Franke [11], Hollingsworth [16], and Lorenc 
[21] from a practical point of view. In theory, Yakowitz and 
Szidarovszky [43] have shown that (in the absence of measure- 
ment errors) the approximation converges as the set of data 
points becomes dense. Within some limitations this result 
holds even if the covariance functions are wrong. 

The error estimate for optimum interpolation can be shown 
(by substitution) to be 



Thus, 





E[ (Z (X) - Z (X) ) 2 ] 



C(X,X) - (2Q t m 1 



-Q t m" 1 Q t )V c , 
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where Q = I-F(F t m l F) 1 F t m 1 . If E[Z(X)] = 0, then F = 0 
and the above reduces to the more familiar form 

E[(Z(X) -Z(X)) 2 ] = C ( X, X) - vV^V . 

— — — — — c — c 

As noted by Yakowitz and Szidarovsky [43] , this estimate is 
good only if the covariance functions are correct. They show 
that error estimates with incorrect covariance functions may 
be so poor as to not converge to zero as the data points be- 
come dense, even though the approximation converges. The 
net result of this is that one should not place too much faith 
in the error estimates. The covariances assumed are almost 
certainly wrong, and the more drastic effect is on the error 
estimate rather than the approximation itself. 

Computationally the choice between solving (6) , then 
evaluating Z (X) by (7), or solving (8), then evaluating 
Z (X) by (9) depends on two things: (i) If the error estimate 

is also to be computed, (6) - (7) is cheaper; (ii) If the 
error estimate is not to be computed, then (8) - (9) is cheaper, 
except in the instance of only one evaluation of Z (X) . 

In meteorological applications it is impossible to 
sider all data points at once. This leads to some selection 
process based on the "most important" observations. Often the 
closest points are considered the most important. Another scheme 
is to retain the points corresponding to the larger terms in the 
covariance matrix (C (Xj_, Xj ) ) • Since the importance of a point 
depends on the entries in the inverse of the matrix rather 
than the matrix itself, this does not seem to be a good scheme. 
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A reasonable choice is probably to use "closest points" in 
some norm which accounts for prevailing influences, e.g., 
winds . 

Several recent papers of practical and theoretical inter- 
est relate optimum interpolation to conventional approximation 
theory. Among these are Kimmeldorf and Wahba [17], Matheron 
[24], Salkauskas [33], and Wahba [40]. The significance of 
the result for meteorological applications is discussed by 
Wahba. For errors which have common variance and covariance 
functions which have a finite square integral (over the entire 
plane) , optimum interpolation leads to the function Z (X) given 
by (7) . This function is also the solution of a variational 
problem in the spatial domain, that is: Find Y(X) (in a cer- 

tain reproducing kernel Hilbert space) to minimize 

N 2 
l (Z(X.) + e . - Y(X.) ) + AJ(Y) , 

j=l 33 3 

where A is the ratio of the variance of the errors to the 
variance of the random function Z(X) (= C(X,X)) , and J (Y) 
is the square norm of Y in the Hilbert space. By appropriate 
choice of the functional J(Y), new methods can be obtained 
which minimize or eliminate contributions from unwanted modes. 

4 . Cressman's Successive Correction Scheme 

Cressman's scheme [10] and variations of it [4] are often 
used for scattered data in meteorology. We will develop the 
scheme as a matrix iterative process, and show that it may not 
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converge (although as applied, usually does) if the iteration 
is continued. In the next section, we will show a relation 
between a variant of the Cressman scheme and optimum inter- 
polation. We note that Cressman' s scheme bears a resemblance 
to Shepard's method [35], [13], but its differences are more 

important than its similarities. Most importantly, the gradi- 
ents of the approximating surface are not necessarily zero at 
the data points as they are for Shepard's method. In addition, 
as originally proposed by Cressman, the function is not smooth, 
i.e., does not have continuous partial derivatives. 

Cressman 's scheme achieves a weighted average of the data 
(a convex combination, in fact) as follows. Let the data again 
be denoted by Z(X^), j = 1,...,N, and associate a weight func- 
tion, Wj (X) , with each point X^ . This function is ordinarily 
a univariate function of distance, | | X - X j | | . Cressman 
proposed using 



( 10 ) 




2 



< D 



2 



| |X-Xj| | > D 2 . 



Another scheme [4] is to take 



(ID 



W_.(X) = exp(-| | X - X j I ! 2 /B 2 ) • 



A first approximation is taken to be 




N 



N 



( 12 ) 



l W . (X) Z (X • ) / l W. (X) . 

j=l : 3 j=l 3 
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The denominator normalizes the weights and since the VT (X) 
are positive, Z^ (X) is a convex combination of the data. 

As such, it satisfies the property, min ( Z (X .) ) <_ Z^ (X) <_ 

j 3 ~ 

max ( Z (X . ) ) . 

j 3 

Usually, the process is repeated to correct the differ- 
ences between the data and the approximation Z(X_.) - Z^ ( x j ) , 
but using a smaller "radius". This means using a smaller D 
in (10), or a smaller B in (11). With superscripts denoting 
iterations, we have the following scheme: Let Z^ (X) be given 

by (12) , with (X) replaced by ^ (X) . Then 



(13) Z (k) (X) = Z (k 1) 



N 



(X) + l W. (k) (X) (Z(X.) 
j=l 3 D 



- z 



(k-1) 



N ... 

(X.!/ Y w. (K) (x) , k = 1,2. . . . 
“3 jil 3 



If we look at the sequence of vectors which approximate the 



t ( ki 

data vector, Z= ( Z (X^ . . . Z (X N ) ) , we have {(Z v 1 (X^ . 



(k) 



t ( k) 

(X ) ) }, k = 0,1,... . Denote these vectors by Z_ , and 



-N 



define the matrix 



(14) 



H 



(k) 



w. (k) (X.) 

J =i_ 



N 



(k) 



y w v ' x. 

• p —1 

P=1 * 



, i , j 1,...,N< 



Then the iteration takes the form 



(15) 



Z<°> = H (0) z 



(k) _ z (k-l) x „(k) 



+ H v ; ( Z - Z 



(k-1) 



), k— 1,2,. 
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Formally, the latter can be written as 



Z_( k ) = H (k ^Z+ (I - H ) Z (k_1 ) , 



and thus 



Z - Z 



00 _ 



(I - H (k) ) 



(Z - 



z (k_ !) ) 



k = 1,2 



This easily leads to 

k 

(16) Z-Z (k) = [ n (I - H (p) ) ] (Z - z (0) ) , 

P=1 

and we see that this iteration converges provided that, for 
sufficiently large k, the norms of the I-H ^ are bounded by 
a constant less than one. This holds for any norm; hence, if 
all eigenvalues of the I-H have magnitude bounded by a 
constant less than one, for sufficiently large k, convergence 
is obtained. 

Generally, the effect of decreasing D in (10) or B in (11) 

is to increase the relative size of the diagonal elements in 
(k) 

H . Since each row sums to one, the matrix will eventually 

become diagonally dominant. In any case, if all eigenvalues 

are positive (as for (11) , for example) , the eigenvalues of 
(k ) 

I-H are then bounded by a constant less than one, inde- 
pendent of k, provided B is a decreasing function of k. 

The situation for weights given by (10) is not so pleasant. 

(k) 

In this case, the matrix H may have negative eigenvalues, 

( k) 

which leads to I-H having eigenvalues greater than one. 
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As D decreases, all eigenvalues do become positive. Whether 
or not this happens for values of D used in practice should 
be investigated since this has a bearing on the stability of 
the iteration. 

The effect of decreasing D in (10) or B in (11) is to speed 

convergence of the iteration, since this tends to increase the 

(k) (k) 

eigenvalues of H . We note in passing that H v is a sto- 
chastic matrix, and thus has its largest eigenvalue equal to 

T 

one, with eigenvector (l,...l) . In terms of the approxima- 

tion, this means convergence in one iteration if Z is a con- 
stant vector, i.e., = (c,...,c) t , c is a constant. The 

maximum/minimum principle cited earlier implies the scheme is 
exact for constants however. 

In the next section we discuss the general form of the 
approximation, and show that under certain simple modifications 
the scheme approximates optimum interpolation. 



5 . Relation of a Variant of Cressman's Scheme to Optimum 
Interpolation 

The general form of Z ^ ^ (X) in (12) is a rational function 
in the weights, W^ (X) , and if the scheme is iterated as in (13) , 
the form is that of a sum of functions, each rational in the 



appropriate set of weights W. 



(k) 



, k = 0,1,.. 



If weights 



were taken to be the functions C(X,X.) +C (X,X-) in (1) , for 

0 £ D 

all k, the resulting approximation bears some relationship to 
optimum interpolation, although it is rational in the covari- 
ance functions rather than linear in them. However, if the 

( k) 

denominators of (12) and (13) (and hence, of H ) are replaced 
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by a suitable constant, the iteration will converge to the 
optimum interpolation function. 

First we consider the covariance matrix 



(17) 



M = (C(X.,X ) +C £ (X i ,X)), i,j = 1,..., 



N 



This matrix must be positive semidefinite and we make the 
usual assumption that it is definite, i.e., has no zero eigen- 
values. Let | | M | | denote the max row sum norm of M, and let 
3 be a constant satisfying 3 > |m| | . Now consider the 

iteration obtained by replacing the denominators in (12) and 
(13) by 3/ which leads to the matrix iteration analogous to 
(15) 



(18) 



(0) - H 



(k) = ^(k 1) + i-M(z.-z. (k 1) )' k = 1/ 2 , 



This leads to the analogue of (16) , 



(19) 



Z - Z (k) = (I -^M) k (Z - Z (0) ) 



Thus, convergence is obtained whenever has all eigen- 

values strictly bounded by one. Since the eigenvalues of j^-M 
must be bounded by | | ^ M | | < 2 by our choice of 3/ convergence 



(k) 



is obtained. The form of each Z K (X) is a linear combination 

of the C(X,X.) + C (X,X.) . Because convergence implies agree- 
3 t 3 

ment at the data points, we see that if the error covariances 
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2 

have the form noted before, C £ (X,X^) = a £ 6 (X - X^) , the 
limiting approximation given by (18) agrees exactly with that 
given by (8) - (9) , except at the X^, where a jump occurs to 
yield "interpolation." Of course one thinks in terms of 
dropping that term for the final approximation, but must do 
so only if an evaluation point coincides with a grid point. 
Practically speaking, the reverse situation is where the 
special instance is encountered. 

The rate of convergence may be slow because of the like- 
lihood of M having small eigenvalues, leading to I - having 

p 

eigenvalues close to one. However, the presence of the error 
covariance tends to increase the eigenvalues of M, and in this 
respect, large observational errors would benefit the conver- 
gence rate. It would seem best to try to choose 8 to minimize 
the magnitude of the eigenvalue of I-^M of largest magnitude. 

p 

This would maximize the rate of convergence. On the other 
hand, most of the significant information may correspond to 

large eigenvalues of M. (Recall that one is an eigenvalue of 

( k) t 

H with eigenvector (1,1,..., 1) .) In this case, it would 

make sense to take 8 ~ j | M 1 1 which would cause rapid conver- 
gence for these modes, while modes corresponding to small 
eigenvalues are of high frequency and could be best filtered 
out . 

The filtering potential of this scheme should be investi- 
gated further to determine whether or not the eigenvectors 
corresponding to small eigenvalues do indeed lead to unwanted 
noise in the approximation which later must be filtered out. 
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If so, this scheme could be an advantageous one to use. 

Some simulations of the scheme have been carried out through 
the iteration process. However, the results are nontrivial 
to interpret and need additional study, particularly in the 
light of the filtering scheme presently used in the opera- 
tional model. The combination of including the measurement 
errors and the constant normalization factor will result in 
the successive correction method appearing more like optimum 
interpolation. A multivariate scheme could be derived in 
a straightforward fashion. 
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